Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A probabilistic method for keyword retrieval in handwritten document images

Identifieur interne : 000A97 ( Main/Exploration ); précédent : 000A96; suivant : 000A98

A probabilistic method for keyword retrieval in handwritten document images

Auteurs : HUAIGU CAO [États-Unis] ; Anurag Bhardwaj [États-Unis] ; Venugopal Govindaraju [États-Unis]

Source :

RBID : Pascal:09-0430764

Descripteurs français

English descriptors

Abstract

Keyword retrieval in handwritten document images is a challenging task because handwriting recognition does not perform adequately to produce the transcriptions, specially when using large lexicons. Existing methods build indices using OCR distances or image features for the purpose of retrieval. These alternative methods are complimentary to the traditional approaches that build indices on OCR'ed text. In this paper, we describe an improvement to the existing keyword retrieval (word spotting) methods by modeling imperfect word segmentation as probabilities and integrating these probabilities into the word spotting algorithm. The scores returned by the word recognizer are also converted into probabilities and integrated into the probabilistic word spotting model.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">A probabilistic method for keyword retrieval in handwritten document images</title>
<author>
<name sortKey="Huaigu Cao" sort="Huaigu Cao" uniqKey="Huaigu Cao" last="Huaigu Cao">HUAIGU CAO</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Center for Unified Biometrics and Sensors (CUBS), Department of Computer Science and Engineering, University at Buffalo</s1>
<s2>Amherst, NY 14260</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
<settlement type="city">Buffalo (New York)</settlement>
</placeName>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
</affiliation>
</author>
<author>
<name sortKey="Bhardwaj, Anurag" sort="Bhardwaj, Anurag" uniqKey="Bhardwaj A" first="Anurag" last="Bhardwaj">Anurag Bhardwaj</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Center for Unified Biometrics and Sensors (CUBS), Department of Computer Science and Engineering, University at Buffalo</s1>
<s2>Amherst, NY 14260</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
<settlement type="city">Buffalo (New York)</settlement>
</placeName>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
</affiliation>
</author>
<author>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Center for Unified Biometrics and Sensors (CUBS), Department of Computer Science and Engineering, University at Buffalo</s1>
<s2>Amherst, NY 14260</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
<settlement type="city">Buffalo (New York)</settlement>
</placeName>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName>
<settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">09-0430764</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0430764 INIST</idno>
<idno type="RBID">Pascal:09-0430764</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000211</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000568</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000214</idno>
<idno type="wicri:doubleKey">0031-3203:2009:Huaigu Cao:a:probabilistic:method</idno>
<idno type="wicri:Area/Main/Merge">000B08</idno>
<idno type="wicri:Area/Main/Curation">000A97</idno>
<idno type="wicri:Area/Main/Exploration">000A97</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">A probabilistic method for keyword retrieval in handwritten document images</title>
<author>
<name sortKey="Huaigu Cao" sort="Huaigu Cao" uniqKey="Huaigu Cao" last="Huaigu Cao">HUAIGU CAO</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Center for Unified Biometrics and Sensors (CUBS), Department of Computer Science and Engineering, University at Buffalo</s1>
<s2>Amherst, NY 14260</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
<settlement type="city">Buffalo (New York)</settlement>
</placeName>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
</affiliation>
</author>
<author>
<name sortKey="Bhardwaj, Anurag" sort="Bhardwaj, Anurag" uniqKey="Bhardwaj A" first="Anurag" last="Bhardwaj">Anurag Bhardwaj</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Center for Unified Biometrics and Sensors (CUBS), Department of Computer Science and Engineering, University at Buffalo</s1>
<s2>Amherst, NY 14260</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
<settlement type="city">Buffalo (New York)</settlement>
</placeName>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
</affiliation>
</author>
<author>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Center for Unified Biometrics and Sensors (CUBS), Department of Computer Science and Engineering, University at Buffalo</s1>
<s2>Amherst, NY 14260</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
<settlement type="city">Buffalo (New York)</settlement>
</placeName>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName>
<settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Pattern recognition</title>
<title level="j" type="abbreviated">Pattern recogn.</title>
<idno type="ISSN">0031-3203</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Pattern recognition</title>
<title level="j" type="abbreviated">Pattern recogn.</title>
<idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithm</term>
<term>Alternative method</term>
<term>Document image processing</term>
<term>Handwriting recognition</term>
<term>Image processing</term>
<term>Information retrieval</term>
<term>Keyword</term>
<term>Lexicon</term>
<term>Manuscript character</term>
<term>Modeling</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Probabilistic approach</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Approche probabiliste</term>
<term>Mot clé</term>
<term>Caractère manuscrit</term>
<term>Traitement image document</term>
<term>Reconnaissance écriture</term>
<term>Lexique</term>
<term>Reconnaissance optique caractère</term>
<term>Méthode alternative</term>
<term>Modélisation</term>
<term>Segmentation</term>
<term>Algorithme</term>
<term>Recherche information</term>
<term>Traitement image</term>
<term>Reconnaissance forme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Keyword retrieval in handwritten document images is a challenging task because handwriting recognition does not perform adequately to produce the transcriptions, specially when using large lexicons. Existing methods build indices using OCR distances or image features for the purpose of retrieval. These alternative methods are complimentary to the traditional approaches that build indices on OCR'ed text. In this paper, we describe an improvement to the existing keyword retrieval (word spotting) methods by modeling imperfect word segmentation as probabilities and integrating these probabilities into the word spotting algorithm. The scores returned by the word recognizer are also converted into probabilities and integrated into the probabilistic word spotting model.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>État de New York</li>
</region>
<settlement>
<li>Buffalo (New York)</li>
</settlement>
<orgName>
<li>Université d'État de New York</li>
<li>Université d'État de New York à Buffalo</li>
</orgName>
</list>
<tree>
<country name="États-Unis">
<region name="État de New York">
<name sortKey="Huaigu Cao" sort="Huaigu Cao" uniqKey="Huaigu Cao" last="Huaigu Cao">HUAIGU CAO</name>
</region>
<name sortKey="Bhardwaj, Anurag" sort="Bhardwaj, Anurag" uniqKey="Bhardwaj A" first="Anurag" last="Bhardwaj">Anurag Bhardwaj</name>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A97 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A97 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:09-0430764
   |texte=   A probabilistic method for keyword retrieval in handwritten document images
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024